Search CORE

2,261 research outputs found

Scalable methods for computing state similarity in deterministic Markov Decision Processes

Author: Castro Pablo Samuel
Publication venue
Publication date: 21/11/2019
Field of study

We present new algorithms for computing and approximating bisimulation metrics in Markov Decision Processes (MDPs). Bisimulation metrics are an elegant formalism that capture behavioral equivalence between states and provide strong theoretical guarantees on differences in optimal behaviour. Unfortunately, their computation is expensive and requires a tabular representation of the states, which has thus far rendered them impractical for large problems. In this paper we present a new version of the metric that is tied to a behavior policy in an MDP, along with an analysis of its theoretical properties. We then present two new algorithms for approximating bisimulation metrics in large, deterministic MDPs. The first does so via sampling and is guaranteed to converge to the true metric. The second is a differentiable loss which allows us to learn an approximation even for continuous state MDPs, which prior to this work had not been possible.Comment: To appear in Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

A Comparative Analysis of Expected and Distributional Reinforcement Learning

Author: Bellemare Marc G.
Castro Pablo Samuel
Lyle Clare
Publication venue
Publication date: 21/02/2019
Field of study

Since their introduction a year ago, distributional approaches to reinforcement learning (distributional RL) have produced strong results relative to the standard approach which models expected values (expected RL). However, aside from convergence guarantees, there have been few theoretical results investigating the reasons behind the improvements distributional RL provides. In this paper we begin the investigation into this fundamental question by analyzing the differences in the tabular, linear approximation, and non-linear approximation settings. We prove that in many realizations of the tabular and linear approximation settings, distributional RL behaves exactly the same as expected RL. In cases where the two methods behave differently, distributional RL can in fact hurt performance when it does not induce identical behaviour. We then continue with an empirical analysis comparing distributional and expected RL methods in control settings with non-linear approximators to tease apart where the improvements from distributional RL methods are coming from.Comment: To appear in the Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligenc

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Small batch deep reinforcement learning

Author: Bellemare Marc G.
Castro Pablo Samuel
Obando-Ceron Johan
Publication venue
Publication date: 05/10/2023
Field of study

In value-based deep reinforcement learning with replay memories, the batch size parameter specifies how many transitions to sample for each gradient update. Although critical to the learning process, this value is typically not adjusted when proposing new algorithms. In this work we present a broad empirical study that suggests {\em reducing} the batch size can result in a number of significant performance gains; this is surprising, as the general tendency when training neural networks is towards larger batch sizes for improved performance. We complement our experimental findings with a set of empirical analyses towards better understanding this phenomenon.Comment: Published at NeurIPS 202

arXiv.org e-Print Archive

Metrics and continuity in reinforcement learning

Author: Bellemare Marc G.
Castro Pablo Samuel
Lan Charline Le
Publication venue
Publication date: 02/02/2021
Field of study

In most practical applications of reinforcement learning, it is untenable to maintain direct estimates for individual states; in continuous-state systems, it is impossible. Instead, researchers often leverage state similarity (whether explicitly or implicitly) to build models that can generalize well from a limited set of samples. The notion of state similarity used, and the neighbourhoods and topologies they induce, is thus of crucial importance, as it will directly affect the performance of the algorithms. Indeed, a number of recent works introduce algorithms assuming the existence of "well-behaved" neighbourhoods, but leave the full specification of such topologies for future work. In this paper we introduce a unified formalism for defining these topologies through the lens of metrics. We establish a hierarchy amongst these metrics and demonstrate their theoretical implications on the Markov Decision Process specifying the reinforcement learning problem. We complement our theoretical results with empirical evaluations showcasing the differences between the metrics considered.Comment: Accepted at AAAI 202

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications